Selective postponement of branch target buffer (BTB) allocation

ABSTRACT

A system and method provides branch target buffer (BTB) allocation. When a branch instruction is received, a branch target address that corresponds to the branch instruction is determined. A determination is made whether the branch target address is presently stored in a branch target buffer (BTB). When the branch target address is not presently stored in the branch target buffer, an entry in the branch target buffer is identified to receive the branch target address. A value in a field within the identified entry in the branch target buffer, such as a postponement flag (PF), is used to selectively override a replacement decision defined by predetermined branch target buffer allocation criteria. In one form, if a branch is taken, the identified entry is replaced with the branch target address in response to determining that the value in the field within the identified entry has a predetermined value.

CROSS-REFERENCE TO RELATED APPLICATION(S)

This application is related to Ser. No. 12/040210, filed on even date,entitled “Metric For Selective Branch Target Buffer (BTB) Allocation,”naming William C. Moyer and Jeffrey W. Scott as inventors, and assignedto the current assignee hereof.

BACKGROUND

1. Field

This disclosure relates generally to data processing systems, and morespecifically, to using one or more metrics for selective BTB allocation.

2. Related Art

Branch target buffers have been used extensively to improve processorperformance by reducing the number of cycles spent in execution ofbranch instructions. Branch target buffers act as a cache of recentbranches and accelerate branches by providing either a branch targetaddress (address of the branch destination) or one or more instructionsat the branch target prior to execution of the branch instruction, whichallows a processor to more quickly begin execution of instructions atthe branch target address.

Branch lookahead schemes are also used to accelerate branch processing,and operate by scanning ahead into the sequential instruction stream,looking for upcoming branch instructions in advance of their execution,and computing branch target addresses of branches early, to allow branchtarget instructions to be fetched in advance of branch instructionexecution, in case the branch is taken.

Branch prediction logic may be used with both BTB and branch lookaheadschemes to allow for an early prediction of the outcome (taken or nottaken) of a conditional branch, prior to the resolution of the branchcondition, thus allowing for increased branch performance when accuracyof the predictor is high.

Many current branch target buffer designs use an allocation policy thatallocates an entry for every branch instruction encountered in theinstruction stream. This approach tends to be inefficient, since nottaken branches are likely to be not taken in the future, and allocatingan entry for them may displace future taken branch entries, thuslowering the hit rate of the branch target buffer.

Another approach waits to allocate an entry in the branch target bufferuntil it is known that a branch is actually taken, since a not-takenbranch has a high probability of not being taken on the next execution.For larger branch target buffers, this may be a reasonable approach,however, for low-cost systems where the size of the branch target buffermust be minimized, an improved method of allocating new entries in thebranch target buffer is desired.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and is notlimited by the accompanying figures, in which like references indicatesimilar elements. Elements in the figures are illustrated for simplicityand clarity and have not necessarily been drawn to scale.

FIG. 1 illustrates, in block diagram form, a data processing systemhaving a BTB in accordance with one embodiment of the present invention.

FIG. 2 illustrates a state diagram for updating a predictor field inBTB.

FIGS. 3-5 illustrate timing diagrams according to various examples ofthe instruction pipeline.

FIGS. 6 and 7 illustrate allocation decision tables in accordance withvarious embodiments of the present invention.

DETAILED DESCRIPTION

In various embodiment described herein, allocation of a taken branch ina branch target buffer (BTB) may be conditioned on a variety ofdifferent criteria. In one embodiment, the decision whether to allocatethe taken branch in the BTB is based on information of the entry that isidentified to be replaced in the BTB for allocation. For example, thedecision whether to allocate may be based on a branch predictor state ofthe identified entry to be replaced. In one embodiment, the decision maybe based on cycle saving information of the new branch that is to beallocated into the BTB, such as on the cycles that could be saved on asubsequent access to the new branch if that new branch were in the BTB.In one embodiment, when a new branch is allocated into the BTB, thiscycle savings information is stored as well. In this manner, thedecision of whether or not to allocate may instead be based on arelative cycle saving between the new branch to be stored uponallocation into the BTB and the branch in the identified entry to bereplaced upon allocation. Alternatively, other criteria may be used or acombination of these or other criteria can be used. Also, one embodimentallows for the use of a postponement flag which actually allows for thereplacement of an entry in the BTB identified for replacement uponallocation to be postponed.

As used herein, the term “bus” is used to refer to a plurality ofsignals or conductors which may be used to transfer one or more varioustypes of information, such as data, addresses, control, or status. Theconductors as discussed herein may be illustrated or described inreference to being a single conductor, a plurality of conductors,unidirectional conductors, or bidirectional conductors. However,different embodiments may vary the implementation of the conductors. Forexample, separate unidirectional conductors may be used rather thanbidirectional conductors and vice versa. Also, plurality of conductorsmay be replaced with a single conductor that transfers multiple signalsserially or in a time multiplexed manner. Likewise, single conductorscarrying multiple signals may be separated out into various differentconductors carrying subsets of these signals. Therefore, many optionsexist for transferring signals.

The terms “assert” or “set” and “negate” (or “deassert” or “clear”) areused herein when referring to the rendering of a signal, status bit, orsimilar apparatus into its logically true or logically false state,respectively. If the logically true state is a logic level one, thelogically false state is a logic level zero. And if the logically truestate is a logic level zero, the logically false state is a logic levelone.

Each signal described herein may be designed as positive or negativelogic, where negative logic can be indicated by a bar over the signalname or an asterix (*) following the name. In the case of a negativelogic signal, the signal is active low where the logically true statecorresponds to a logic level zero. In the case of a positive logicsignal, the signal is active high where the logically true statecorresponds to a logic level one. Note that any of the signals describedherein can be designed as either negative or positive logic signals.Therefore, in alternate embodiments, those signals described as positivelogic signals may be implemented as negative logic signals, and thosesignals described as negative logic signals may be implemented aspositive logic signals.

Brackets are used herein to indicate the conductors of a bus or the bitlocations of a value. For example, “bus 60 [7:0]” or “conductors [7:0]of bus 60” indicates the eight lower order conductors of bus 60, and“address bits [7:0]” or “ADDRESS [7:0]” indicates the eight lower orderbits of an address value. The symbol “$” preceding a number indicatesthat the number is represented in its hexadecimal or base sixteen form.The symbol “%” preceding a number indicates that the number isrepresented in its binary or base two form.

FIG. 1 illustrates, in block diagram form, a data processing system 100in accordance with one embodiment of the present invention. Dataprocessing system 100 includes memory 166, bus 168, and processor 184.Data processing system 100 may include other elements than thoseillustrated, or may include more or fewer elements than thoseillustrated. For example, data processing system 100 may include anynumber of memories, peripheral devices, or processors.

Processor 184 includes an instruction register (IR) 116, a branchaddress calculator (BAC) 108, a program counter 112, a multiplexer (MUX)126, a latch 160, an adder 156, a multiplexer (MUX) 154, a branch targetbuffer (BTB) 144, decode and control logic (DCL) 164, instruction buffer105, comparator 170, control logic 172, and prefetch buffer 102.Prefetch buffer 102 includes instruction slots S0, S1, S2, S3, S4, andS5. Instruction buffer 105 includes prefetch buffer 102 and instructionregister 116. Processor 184 may be any type of processor, such as, forexample, a microprocessor, microcontroller, digital signal processor,etc. In one embodiment, processor 184 may be referred to as a processorcore. In another embodiment, processor 184 may be one of many processorsin a multi-processor data processing system. Furthermore, processor 184may be a pipelined processor.

Still referring to FIG. 1, prefetch buffer 102 is coupled to BAC 108,DCL 164, instruction register 116, and bus 168. BAC 108 is coupled toMUX 126, program counter 112, prefetch buffer 102, and instructionregister 116. MUX 126 is coupled to program counter 112, BAC 108, MUX154, adder 156, latch 160, and bus 168. BTB 144 is coupled to CTRL 172,comparator 170, and MUX 154. Comparator 170 is coupled to MUX 154, BTB144, address bus 128, and DCL 164. DCL 164 is coupled to MUX 126,instruction register 116, comparator 170, prefetch buffer 102, memory166, and CTRL 172. Memory 166 is coupled to bus 168 and DCL 164.

In operation, memory 166 contains a sequence of instructions, eachinstruction having a corresponding instruction address. During a clockcycle of processor 184, DCL 164 determines whether instruction buffer105 has a predetermined number of slots available to store apredetermined number of instructions from memory 166. DCL 164 is able todetermine whether there will be a predetermined number of slotsavailable in instruction buffer 105 because DCL 164 is cognizant of thesize of prefetch buffer 102, the number of reserved slots in instructionbuffer 105, and the number of instructions currently being fetched frommemory via bus 168. The predetermined number of slots may vary dependingupon the pipelined processor being used and is dependent on the numberof instructions fetched and the size of each instruction being fetched.For ease of explanation herein, it will be assumed that processor 184uses a doubleword fetch size, the predetermined number of slots is two,and the predetermined number of instructions being fetched is two (i.e.,two word-size instructions are requested during each doublewordinstruction fetch). Alternate embodiments may use a different number ofprefetch slots, may have a different pipeline, and may have differentfetch sizes and memory latency than the embodiments described herein.

In one embodiment, prefetch buffer 102 is used to hold sequentialinstructions in advance of their execution by processor 184. Instructionregister 116 is used to hold the current instruction being decoded forexecution. As instructions are executed, subsequent instructions areprovided to instruction register 116 by way of conductors 114 fromprefetch buffer slot 0, or from data bus 130. As these instructionsadvance through the pipeline, fetched instructions are shifted into slot0 from slot 1, are shifted into slot 1 from slot 2, and so forth,assuming valid instructions are present in a previous slot. Empty slotsin prefetch buffer 102 may be filled with requested instructions fetchedfrom memory 166 by way of bus 168 and data bus 130.

In one embodiment of the present invention, branch address calculator108 may be used to determine the slot 0 target addresses of theinstructions in slot 0 (S0) in prefetch buffer 102 and the instructionregister target address of the instruction in instruction register 116.For example, during a clock cycle, branch address calculator 108receives the displacement fields of the instruction stored in S0 ofprefetch buffer 102 and instruction register 116 and the address of theinstruction currently being executed from program counter PC 112. Branchaddress calculator 108 then calculates the slot 0 target address (S0TA124) of the instruction in slot 0 and the instruction register targetaddress (IRTA 120) of the instruction in instruction register 116.Either of IRTA, S0TA, etc. may be selected when DCL 164 determineswhether an instruction stored in instruction buffer 105 and/or aninstruction stored in slot 0 (S0) of prefetch buffer 102 is a branchinstruction.

Prefetch buffer 102 allows for decoupling of memory 166 from instructionregister 116, and acts as a first-in first out (FIFO) queue ofinstructions. As long as the instruction execution stream remainssequential without a change of instruction flow, instructions continueto be requested sequentially and supplied to prefetch buffer 102. When achange of flow occurs (such as due to a change of flow instruction), thesequential instruction stream must be discarded, and prefetch buffer 102is flushed of unused sequential instructions and is filled with a newstream of instructions from the target location of the change of flow.Branch instructions are typically used in processor 184 to cause achange of flow to occur to a new instruction stream, although additionalevents such as interrupts and exception processing may also cause achange of flow to occur. Change of flow events cause a disruption in thenormal execution of instructions in processor 184 since the currentinstruction stream is discarded, and a new instruction streamestablished. This may cause processor 184 to stall instruction executionfor one or more cycles while waiting for the new instruction stream tobe established, thus possibly lowering overall performance andefficiency of data processing system 100.

Also, the contents of prefetch buffer 102 may be examined to determinethe presence of a branch instruction which may cause a change of flow,in advance of the execution of the branch instruction by processor 184.By examining the instruction stream in advance of execution, it ispossible that a branch instruction may be detected early enough beforeit is executed that the target instruction stream can be established inadvance in order to minimize processor 184 stalls, thus possiblyimproving execution efficiency of data processing system 100. This isknown as performing “branch lookahead”. The depth of lookahead requiredto avoid stall conditions is a function of the processor pipeline depthand memory access latency, and may vary in different embodiments of thecurrent invention. Note that multiple branch instructions may be presentin instruction register 116 and prefetch buffer 102 at any given time;therefore, DCL 164 may prioritize the selection of the branch targetstream to be accessed. In one embodiment, DCL 164 scans instructionregister 116 for a branch instruction first, and then scans prefetchbuffer 102 slot 0 for a branch instruction, since this represents thelogical order of instructions in the instruction stream. If a branchinstruction is found in a higher priority location (e.g. instructionregister 116), those in lower priority locations are temporarilyignored.

In the illustrated embodiment, the lookahead depth is equal to oneinstruction prior to the branch instruction reaching the instructionregister 116 for execution. In one embodiment, if the branch target canbe calculated and an access request made to memory to obtain the targetinstruction stream one instruction cycle prior to the branch reachingthe instruction register for execution, a stall may be avoided. This canbe accomplished if the branch instruction is detected in slot 0 (S0) ofprefetch buffer 102, since another instruction will be preceding it,namely that in instruction register 116. In this case, the branch targetmay be calculated, and a request made to memory to obtain the targetinstruction, which will arrive back to processor 184 in time to avoidadditional execution stall cycles due to the delay incurred in accessingthe target instruction stream.

However, it may be possible that not every branch instruction will bedetectable in slot 0 of prefetch buffer 102, since branch instructionsmay bypass slot 0 and be loaded directly into instruction register 116via data bus 130. This may occur when a change of flow occurs, and thetarget instruction itself is a branch instruction. Since the targetinstruction will be loaded directly into instruction register 116, itwill not have been resident in slot 0 of prefetch buffer 102, and thusthere will not have been an opportunity to fetch the target of thatbranch instruction early enough to avoid stalling processor 184. Theremay also be other reasons for a branch instruction to not hit slot 0 ofprefetch buffer 102. For example, a data processing system with aunified memory bus could drain the instruction buffer 105 whileexecuting consecutive load/store instructions or when waiting oninstructions to arrive from memory 166. A subsequent fetch to fill theinstruction register and slot 0 after execution of the load/storeinstructions in the instruction buffer 105 could result in a subsequentbranch instruction not to be processed in slot 0 of the prefetch buffer102. Therefore, using branch lookahead with the prefetch buffer 102 maynot be sufficient to avoid stalls within processor 184 caused by changeof flows.

Note that even if a branch instruction can be loaded into slot 0 ofinstruction buffer 105, there may not be an opportunity to beginfetching the target stream for that particular branch instruction, sinceit is possible that a different branch instruction is also present ininstruction register 116. In this case, in one embodiment, the earlierbranch in instruction register 116 will be given higher priority formemory access, even though the earlier branch may not actually be takenif the branch condition is resolved to indicate “not taken.” In thiscase, the target instruction stream for the first branch will bediscarded, but the opportunity for fetching the target stream for afollowing branch which is resident in slot 0 of prefetch buffer 102 willhave been missed.

It can be seen that using the branch lookahead technique can reduceprocessor 184 stall conditions if an upcoming change of flow instructioncan be detected early enough, and a request made to memory to fetch thetarget stream, but there are frequent cases where it cannot. For thesecases, a branch target buffer may be used to accelerate access to thetarget stream of a branch instruction. In one embodiment, BTB 144 holdsprecalculated target addresses for branch instructions which have beenpreviously stored in the BTB. In one embodiment, each BTB entry in BTB144 includes a branch instruction address (BIA), a branch target address(BTA), a valid (V) field, a prediction state (PRED) field, a cyclessaved (CS) field, and a postponement (PF) field. In one embodiment,within each BTB entry, the BTA is the target address corresponding tothe BIA, the V field indicates whether the entry is valid or not, thePRED field indicates whether the BIA corresponds to a strongly taken(ST), a weakly taken (WT), a weakly not taken (WNT), or a strongly nottaken (SNT) branch, the CS field indicates a number of cycles saved bythe entry, and the PF field indicates if allocation of the entry hasbeen previously postponed. These fields will be discussed in more detailbelow. Alternatively, note that PRED, CS, and PF may not be present inthe BTB entries.

Since calculation of the target address requires time, in oneembodiment, a target address can be provided by BTB 144 one or moreclock cycles earlier than if the branch lookahead technique is used. Foreach instruction address provided to address bus 128 via MUX 126,comparator 170 performs a lookup to determine if the instruction addresshits within BTB 144 (i.e. if the instruction address matches the BIA ofan entry which is marked as valid by the V field) and asserts BTB entryhit signals 196 to DCL 164 to indicate the hitting entry for laterupdating of the PRED information. If the hitting entry is also indicatedto be a taken branch (i.e. if the PRED field indicates ST or WT),comparator 170 asserts comparator match signal 197. The target addresscorresponding to the instruction address which hit in BTB 144 isobtained from the BTA field of the matching entry, and a memory requestmay then be made to obtain the target instruction. In this case, MUX 154selects the appropriate entry from BTB 144 via control signal 180 sentby comparator 170, such that MUX 154 provides the BTA of the matchingentry, BTBTA 178, to MUX 126. As will be described in more detail below,DCL 164, in response to assertion of the comparator match signal 197,provides control signal 166 to MUX 126 to select BTBTA 178 and provideit to address bus 128 so that it may be provided to memory 166 via bus168. Thus, if there is a hit in BTB 144 and the prediction from thehitting entry indicates “taken” (e.g. ST or WT), the target instructionstream can be established prior to the branch arriving into prefetchbuffer 102 or instruction register 116. However, note that it is alsopossible that the incoming instruction address which results in a hitwithin BTB 144 may not actually turn out to be a branch instruction. Inthis case, known techniques may be used to correct the instructionstream and pipeline, as needed.

Also, when an instruction address which resulted in a hit in BTB 144 isdetermined to be a branch and is resolved as taken or not taken, thecorresponding PRED field in the entry that matches that branchinstruction is updated according to the state diagram of FIG. 2 via BTBupdate signals 155 from DCL 164 to control 172. Note that FIG. 2illustrates a state diagram 200 for prediction states of each entrywithin BTB 144. Note that the state diagram includes four states: state00 (SNT), state 01 (WNT), state 10 (weakly taken), and state 11(strongly taken). Note that when the branch is resolved as taken thecurrent state is changed to the state which appears immediately to theright of the current state in the state diagram, and when the branch isresolved as not taken, the current state is changed to the state whichappears immediately to the left of the current state in the statediagram. If the PRED field already indicates state 11 and the branchwhich caused a hit of that entry is resolved as taken, then the PREDfield remains in the same state. Similarly, if the PRED field alreadyindicates state 00 and the branch which caused a hit of that entry isresolved as not taken, then the PRED field remains in the same state.Note that alternate embodiments may include more or less predictionstates, and may cycle through the states differently. Any predictionscheme may be used. Furthermore, in one embodiment, the predictionscheme for branches may be implemented outside of BTB 144, such aswithin DCL 164, where BTB 144 may not include PRED fields. In anotheralternate embodiment, no branch prediction scheme may be used at allwithin processor 184.

Upon a branch instruction being resolved as taken (where this branchinstruction did not result in a hit in BTB 144), it is determinedwhether it is to be allocated. In one embodiment, for a BTB allocation,an entry in BTB 144 is first identified for replacement, and, if the newtaken branch is to be allocated, it is stored in the entry that wasidentified for replacement. In one embodiment, an entry of BTB 144 isidentified for replacement using any known method, such as, for example,a least recently used (LRU) algorithm, a modified LRU algorithm, a roundrobin algorithm, etc. However, once an entry is identified forreplacement, it is then determined whether or not the replacement (i.e.the allocation) is to occur. That is, DCL 164 can selectively allocate anew taken branch based on a variety of different factors, such as, forexample, cycle saving information, predictor state information,postponement information, or combinations thereof. (Although, in theexamples described herein, only taken branches are selectively allocatedinto BTB 144, alternate embodiments may selectively allocate, based onparticular criteria, all resolved branches into BTB 144, including thoseresolved as not taken.)

In one embodiment, DCL 164 uses cycle saving information to determinewhether or not a taken branch which did not hit in BTB 144 (i.e. whichmissed in BTB 144) is to be allocated an entry in BTB 144. (Note thatthe term “cycle” may also be referred to as a data processor cycle, aprocessor cycle, or a clock cycle.) This cycle saving information mayrefer to cycle saving information relating to the branch instructionstored within the entry that has been identified for replacement, or tothe new taken branch that is to be selectively allocated, or to arelative measurement of cycle savings between the two. In oneembodiment, each time DCL 164 determines that a branch instructionshould be allocated an entry in BTB 144 (regardless of the criteria usedto make this determination), then its corresponding cycle savingsinformation is also stored within the allocated entry in the CS field.In one embodiment, DCL 164 determines the number of cycles that would besaved on a subsequent access of the new branch instruction if thatinstruction were stored in BTB 144 and provides this number via BTBentry information 175 to control 172 such that when load BTB 174 isasserted by DCL 164 to store the new branch instruction in BTB 144, thenumber of cycles saved can also be stored in the CS field of the entry.In an alternate embodiment, BTB 144 may not include the CS field, inwhich case this cycle savings information may be stored elsewhere,outside of BTB 144.

In one embodiment, DCL 164 determines a number of cycles (or anestimated number of cycles) that would be saved on a subsequent accessof the branch instruction if that branch instruction were stored in BTB144. In this example, DCL 164 may choose to allocate an entry for thetaken branch only if the cycle saving is above a predeterminedthreshold. In another embodiment, the determination of whether or not toallocate an entry in BTB 144 may be made by comparing cycle savinginformation relating to the taken branch that is to be allocated versusthe cycle saving information relating to the branch in the entry of BTB144 that would be replaced with this new allocation. For example, DCL164 may determine a difference between a savings in data processingcycles obtained by a presently stored target address in the entry to bereplaced and a savings in data processing cycles that would be realizedin response to a subsequent access of the branch target address if thatbranch target address were allocated or stored in that entry. In thisexample, DCL 164 may allocate the branch instruction only if thedifference is greater than a predetermined threshold. Also, note thatthe cycle saving information relating to the branch in the entry of BTB144 that would be replaced can be provided by CS 192 to DCL 164, whereCS 192 provides the value of the CS field corresponding to the entryidentified for replacement. In yet another embodiment, the determinationof whether or not to allocate an entry in BTB 144 may be made based onthe cycle saving information corresponding to the branch instruction ofthe entry that would be replaced upon allocation. This information canbe provided by CS 192 to DCL 164. In this example, DCL 164 may allocatethe branch instruction only if the cycle saving informationcorresponding to the branch instruction already stored in the entry tobe replaced is less than a predetermined threshold.

Therefore, different types of cycle saving information may be used todetermine whether a branch that is determined to be a taken branch isallocated into BTB 144. The cycle saving information may include, forexample, cycle saving information related to the branch instructionalready stored in the entry to be allocated, cycle saving informationrelated to the taken branch instruction that is to be allocated into BTB144, or a relative measurement of cycle savings between the taken branchinstruction that is to be allocated and the branch instruction alreadystored in BTB 144 that would be replaced by the allocation of the takenbranch instruction. In one embodiment, the cycle savings information maybe based on the number of processor stall cycles which may be saved ifthe entry hits in BTB 144 on a future lookup, or may be based on otherperformance or power related factors, such as the number of memoryaccesses saved by obtaining the branch target address from the BTB. Insome embodiments, minimizing the number of stall cycles may be animportant performance criteria, but in other embodiments, minimizing busutilization of bus 168, minimizing the number of instruction fetchcycles to memory 166, or minimizing the number of discarded instructionprefetches may be of primary importance, particularly if there is adirect correspondence to overall power consumption. Any of these cyclesavings information may be stored within BTB 144 or elsewhere withinsystem 100. The cycle savings information may also be a function of oneor more of these factors in a weighted combination, where the weightingsof each factor may be predetermined, or may be dynamically determined asthe execution performed by system 100 occurs. The dynamic determinationmay be made by profiling hardware or software contained within system100.

In yet another embodiment, the determination of whether or not toallocate an entry in BTB 144 may be made based on a prediction state ofthe entry in BTB 144 to be replaced by the taken branch. As describedabove, the PRED field of each entry may indicate whether thecorresponding branch instruction address is predicted to ST, WT, WNT,and SNT. In one embodiment, the taken branch is allocated only if theentry in BTB 144 to be replaced is indicated to be WNT or SNT by itscorresponding PRED field. Alternatively, the taken branch is allocatedonly if the entry in BTB 144 to be replaced is indicated to be WNT, SNT,or WT by its corresponding PRED field.

In one embodiment of the present invention, the determination of whetheror not to allocate an entry in BTB 144 is based on both cycle savingsinformation and the prediction state of the entry in BTB 144 to bereplaced. For example, FIG. 6 illustrates an allocation decision table201 that may be implemented by DCL 164, which, based on particularfactors, provides criteria used to determine whether or not to replacethe identified entry of BTB 144. Therefore, each entry in table 201corresponds to a replacement decision for a particular criteria or setof criteria. For example, if the identified entry of BTB 144 that is tobe replaced if the new branch instruction is to be allocated in BTB 144has a PRED field value of 00 (indicating SNT), then regardless of therelative cycle savings of the new branch relative to the existing entry,allocation is performed. In this case, the new branch instruction whichwas resolved as taken is stored in the identified entry of BTB 144,along with its corresponding cycle saving value in the CS field (whereagain, the cycle saving value corresponds to the number of processorcycles, or estimate thereof, that would be saved if the branchinstruction were stored in BTB 144). However, if the identified entry ofBTB 144 that is to be replaced has a PRED value of 01 (indicating WNT),10 (indicating WT), or 11 (indicating ST), then allocation selectivelyoccurs based on the cycle savings of the new branch relative to theexisting entry. For example, in these cases, DCL 164 determines a numberof processor cycles, or an estimate thereof, that would be saved if thebranch instruction were stored in BTB 144 and subtracts the CS value ofthe identified entry (provided by CS 192). According to table 201, ifthe PRED field is 01 and that difference is 1 or more, then replacementoccurs. If the PRED field is 10 and that difference is 2 or more, thenreplacement occurs. If the PRED field is 11 and that difference isgreater than 2, then replacement occurs. If the PRED field is 00, thenreplacement always occurs.

Therefore, note that in the embodiment of FIG. 6, the more likely that abranch instruction currently stored in the identified entry forreplacement of BTB 144 is to be taken upon a subsequent access, the moredifficult it becomes to allow a new taken branch to replace it. Forexample, if the branch instruction currently stored in the identifiedentry is predicted to be ST, then the new branch instruction shouldallow for a savings of more than 2 processor cycles more than thecurrently stored branch instruction before the replacement is allowed tooccur. Alternate embodiments may use different table formats anddifferent combinations of factors to make the determination of whetheror not to allocate (i.e. to make the replacement decisions). Forexample, in one embodiment, table 201 may include only a single row,where only the cycle savings of the new branch relative to the existingentry is taken into consideration and not the predictor state. In thisexample, the table may include a single row which is similar to row 200or 202 of table 201, in which the decision to replace an existing entryis based on whether at least a predetermined threshold number of cyclesis saved. In another example, table 201 may include only a singlecolumn, where only the predictor state of the existing entry is takeninto consideration and not the relative cycle savings. In this example,the table may include a single column which is similar to column 204 or206 of table 201. Another embodiment is provided with the table of FIG.7, which will be described in more detail below.

In one embodiment of the present invention, branch address calculator108 may be used to determine the slot target addresses of theinstructions in prefetch buffer 102 and the instruction register targetaddress of the instruction in instruction register 116. For example,during a clock cycle, branch address calculator 108 receives thedisplacement fields of the instructions stored in prefetch buffer 102and instruction register 116 and the address of the instructioncurrently being executed from program counter PC 112. Branch addresscalculator 108 then calculates the slot 0 target address (S0TA) of theinstruction in slot 0 and the instruction register target address (IRTA)of the instruction in instruction register 116. Either of IRTA, S0TA,etc. may be selected when DCL 164 determines that an instruction storedin instruction buffer 105 is a branch instruction, as will be describedfurther below.

Operation of at least one embodiment of the present invention shown FIG.1 will be described below referencing timing diagrams shown in FIG. 3,FIG. 4, and FIG. 5. It is assumed that at the beginning of the firstclock cycle of each of the timing diagrams shown in FIG. 3 and FIG. 4,load BTB 174 and comparator match signal 197 are deasserted. Also, notethat, for FIG. 3, it is assumed that the branch instruction, BR $30,stored at address $8 misses in BTB 144, but for FIG. 4, it is assumedthat the branch instruction hits in BTB 144.

Referring to FIGS. 1 and 3, during the first clock cycle, DCL 164determines whether two slots are available in instruction buffer 105.When DCL 164 determines that two slots are available in instructionbuffer 105 (e.g., instruction register 116 and slot 0), request signal199 is asserted and the two slots, instruction register 116 and slot 0,are reserved. Request signal 199 is provided to memory 166 from DCL 164and is used to request the instructions being fetched from memory 166.The instruction address corresponding to the initial instruction beingfetched is provided by program counter 112 to MUX 126 via instructionaddress (IA) 123. DCL 164 uses MUX 126 to select the initial instructionaddress 123, which is then driven onto address bus 128 to simultaneouslyrequest instructions I0 and I1 located at address $0 and address $4 inmemory 166 (where a section of example instructions and correspondinginstruction addresses are illustrated in table format in the columnslabeled address and data, respectively).

The instruction address driven onto address bus 128, $0, is provided tolatch 160, comparator 170, and memory 166. Latch 160, which is coupledto address bus 128 and adder 156, captures the instruction addressdriven onto address bus 128. Adder 156 then increments the currentinstruction address by the doubleword fetch size, $8. As statedpreviously, the doubleword fetch size may vary in different embodimentsof the present invention and is dependent upon the size of eachinstruction being fetched and the number of instructions fetched in onerequest. The capture by latch 160 and the incrementation caused by adder156 of the address on address bus 128 occur during every clock cyclewhere there is an address driven onto address bus 128 and a request ismade to memory. The incremented address, in this case, $8, is output byadder 156 as sequential instruction address (SIA) 158.

Comparator 170 receives the address driven onto address bus 128 andcompares the address to the branch instruction address or addresses, ifany, stored in branch target buffer 144. When comparator 170 determinesthat the address driven onto address bus 128 matches any of the branchinstruction addresses stored in BTB 144 and the associated entry isvalid, as indicated by the V field, a BTB hit has occurred andcomparator 170 asserts BTB hit signals 196. If the branch is alsopredicted to be taken, comparator 170 asserts comparator match signal197. Comparator 170 also selects the branch target address correspondingto the branch instruction address that has generated a BTB hit usingsignal 180 and MUX 154. The selected branch target address is providedto MUX 126 using BTBTA 178. Since comparator match signal 197 isasserted, DCL 164 selects BTBTA 178 and drives the branch target addressonto address bus 128.

When comparator 170 determines that the address driven onto address bus128 does not match any branch instruction address in BTB 144 or theaddress driven onto bus 128 matches a branch instruction address in BTB144 but the corresponding entry is invalid, a BTB miss has occurred andBTB hit signals 196 and comparator match signal 197 are deasserted bycomparator 170. When comparator match signal 197 is deasserted, DCL 164does not select BTBTA 178 as the address to be driven onto address bus128. In the example shown in FIG. 3, a BTB miss has occurred during thefirst clock cycle, hence, comparator match signal 197 is deasserted andBTBTA 178 is not selected by DCL 164.

During the second clock cycle of FIG. 3, DCL 164 determines whether twoslots in instruction buffer 105 are available for storing twoinstructions. Since two slots are available in prefetch buffer 102, inthis case, slot 1 and slot 2, DCL 164 asserts request signal 199,reserves slots 1 and 2, and selects SIA 158. The sequential instructionaddress, $8, is driven onto address bus 128 and provided to latch 160,comparator 170, and memory 166. Latch 160 captures the instructionaddress driven onto address bus 128 and adder 156 increments thecaptured instruction address by $8, yielding a sequential instructionaddress of $10. Comparator 170 determines whether a BTB hit or a BTBmiss has occurred between the address driven onto address bus 128, $8,and the valid branch instruction addresses, if any, in BTB 144. In thiscase, since there is not a match between the address driven on addressbus 128 and any address stored in the branch instruction address portionof BTB 144, a BTB miss occurs and BTB hit signals 196 and comparatormatch signal 197 are deasserted. The instructions, I2 and I3,corresponding to the sequential instruction address $8 driven ontoaddress bus 128, are fetched and provided to prefetch buffer 102 via bus168 and data bus 130 during the fourth clock cycle.

During the third clock cycle, instructions 10 and 11, corresponding tothe instruction address driven onto address bus 128 during the firstclock cycle, are driven onto data bus 130 to be loaded into instructionregister 116 and slot 0 of prefetch buffer 102 during the fourth clockcycle. DCL 164 determines that two slots, slots 3 and 4, of prefetchbuffer 102 are available to store two instructions from memory 166 andasserts request signal 199, reserves slots 3 and 4, and selects SIA 158.The selected sequential instruction address $10 is driven onto addressbus 128 and provided to latch 160, comparator 170, and memory 166. Latch160 captures the instruction address driven onto address bus 128 andadder 156 increments the captured instruction address, yielding asequential instruction address of $18. Comparator 170 determines whethera BTB hit or a BTB miss has occurred between the address driven ontoaddress bus 128, $10, and the branch instruction addresses, if any,stored in BTB 144. In this case, since all valid fields of BTB 144 areinvalid, a BTB miss occurs, BTB hit signals 196 and comparator matchsignal 197 are deasserted, and comparator 170 does not select the BTBTA178 corresponding to the BIA.

During the fourth clock cycle, DCL 164 determines whether two slots inprefetch buffer 102 are available to store two instructions from memory166. Since, in this case, the three previous fetches during the firstthree clock cycles (at two instructions per fetch) have filled up orreserved six slots in prefetch buffer 102, two slots are not availableto store two additional instructions from memory 166. Hence, requestsignal 199 is deasserted, no additional slots are reserved, and DCL 164selects SIA 158. Since SIA 158 is selected, sequential instructionaddress, $18, is driven onto address bus 128 and provided to latch 160,comparator 170, and memory 166. However, since request signal 199 isdeasserted, the instructions corresponding to the instruction address$18 driven onto address bus 128, are not requested during the fourthclock cycle, since two slots are not available in prefetch buffer 102.Comparator 170 receives the non-requested address and, in oneembodiment, compares the non-requested address to the entries of BTB144. DCL 164, however, ignores any comparator match signal 197 sent bycomparator 170 when request signal 199 is deasserted. In an alternateembodiment, since no memory request is made, no comparison is made bycomparator 170.

Instruction 10, requested during the first clock cycle, is loaded intoinstruction register 116 and instruction 11, also requested during thefirst clock cycle, is loaded into slot 0 in cycle 4. Instructions I2 andI3, corresponding to the instruction address $8 driven onto address bus128 during the second clock cycle, are placed on bus 168 and data bus130 to be loaded into slot 0 and slot 1 of prefetch buffer 102 at thebeginning of the fifth clock cycle.

DCL 164 receives opcode 176 of the instruction 10 from instructionregister 116 and determines whether the instruction is a branchinstruction. DCL 164 is cognizant of the opcodes of the branchinstructions being used in data processing system 100 and is able tocompare the received opcode 176 to the opcodes of processor 184. WhenDCL 164 determines that the instruction 10 in instruction register 116is not a branch instruction, DCL 164 uses opcode 190 of the instructionI1 in slot 0 to determine whether the instruction loaded into slot 0 isa branch instruction. When DCL 164 determines that there is not a branchinstruction in instruction register 116 or in slot 0 of prefetch buffer102, the current cycle ends without branch processing and processor 184continues to the fifth clock cycle.

During the fifth clock cycle, DCL 164 determines whether two slots inprefetch buffer 102 are available to store two instructions from memory166. Since two slots are available, in this case, slot 4 and slot 5,request signal 199 is asserted and slots 4 and 5 are reserved.Instructions I4 and I5, corresponding to the instruction address $10driven onto address bus 128 during the third clock cycle, are placed onbus 168 and data bus 130 to be loaded into slot 1 and slot 2 of prefetchbuffer 102 during the sixth clock cycle. Instruction 10, present ininstruction register 116 during the fourth clock cycle, is replaced withinstruction I1 from slot 0. Instructions I2 and I3, requested during thesecond cycle, are loaded into slot 0 and slot 1, respectively. Asinstructions are executed, subsequent instructions are shifted forward(toward the instruction register 116) into the next slot. Instructionsfetched from memory may not necessarily be loaded into the slotsreserved at the time a request was made to perform the fetch, sinceinstructions in the prefetch buffer may be proceeding forward towardinstruction register 116 as earlier instructions are executed. Instead,they are loaded into the proper slots corresponding to the progressionof instructions which precede them.

The opcodes 176 and 190 of instructions I1 and I2 loaded intoinstruction register 116 and slot 0 are provided to DCL 164. DCL 164receives opcode 176 of the instruction I1 stored in instruction register116 and determines whether the instruction is a branch instruction. WhenDCL 164 determines that the instruction I1 located in instructionregister 116 is not a branch instruction, DCL 164 uses opcode 190 todetermine whether the instruction I2 in slot 0 is a branch instruction.

When DCL 164 determines that the instruction loaded into slot 0 is abranch instruction (BR $30), DCL 164 determines whether BTB hit signals196 were asserted, indicating a BTB hit. When DCL 164 determines thatBTB hit signals 196 are deasserted (indicating a BTB miss), which is thecase in FIG. 3, DCL 164 determines whether a condition for stall signalin DCL 164 is asserted. The condition for stall signal in DCL 164indicates whether processor 184 has stalled for reasons related to, forexample, an execution dependency of an instruction on a priorinstruction. When DCL 164 determines that a condition for stall signalin DCL 164 is deasserted, which is the case in FIG. 3, branch addresscalculator 108 uses the displacement of the branch instruction in slot 0and the output of program counter 112 to generate slot 0 target address(S0TA) 124, which is used to prefetch the branch instruction's targetaddress. DCL 164 then selects S0TA 124 using MUX 126 to drive to thebranch target address, $30, onto address bus 128. The targetinstruction, T0, stored at the branch target address is then returnedvia bus 168 and data bus during the seventh clock cycle, along with thenext sequential instruction, T1. In this case, an entry in the BTB willbe allocated for the branch if it is resolved to be taken, so that on asubsequent encounter of the branch instruction, a BTB hit may occur, andthe branch target fetch may occur two cycles earlier, without waitingfor branch address calculator 108 to generate the S0TA value.

During the sixth clock cycle, the branch instruction, I2, that was inslot 0 during the fifth clock cycle, is loaded into instruction register116, instruction I3 is loaded into slot 0 from slot 1, I4 is loaded intoslot 1 from data bus 130, and I5 is loaded into slot 2 from data bus130. DCL 164 uses opcode 176 to determine whether instruction I2 ininstruction register 116 is a branch instruction. Once the branchinstruction I2 is resolved to be a taken branch instruction, DCL 164determines whether or not an entry in BTB 144 will be allocated for thebranch. Any of the factors discussed above can be used by DCL 164 tomake the determination of whether or not to allocate an entry. In oneembodiment, BTB 144 has available entries where there is no need toreplace an existing valid entry; in which case an entry can be allocatedfor I2 without incurring any penalties. Note that control circuitry 172uses lines 173 to identify which slot in BTB 144 is used to store thebranch instruction address and the branch target address. If an entry isallocated, the instruction address for I2 and the branch target addressfor I2 can be provided to BTB 144 via PC 110 and IRTA 120, respectively,and the valid field in BTB 144 associated with the loaded entry isasserted. Also, the PRED field of the allocated entry can be set to aninitial default state, such as, for example, state 10, indicating WT,and a cycle saving value associated with allocating a location in BTB144 for I2 can be provided via BTB Entry Information 175.

In the illustrated embodiment, DCL 164 counts the number of cycles fromthe time that the branch instruction (I2) can leave instruction register116 (which is at the end of cycle 6) to the time that the targetinstruction (T0) can leave instruction register 116 (which is at the endof cycle 8). This number is 2. Had there been a BTB hit, though, as isthe case in FIG. 4, the number of cycles from the time that the branchinstruction (I2) can leave instruction register 116 (which is at the endof cycle 6) to the time that the target instruction (T0) can leaveinstruction register 116 (which is at the end of cycle 7) is only 1.Therefore, a savings of 1 cycle (2 cycles−1 cycle=1 cycle) would berealized upon a subsequent fetch of branch instruction I2 if branchinstruction I2 were present in BTB 144. Therefore, assuming DCL 164determines that branch instruction I2 is to be allocated in BTB 144, avalue of 1 would be stored in the CS field of the identified entry forallocation. DCL 164, in one embodiment, uses this cycle savingsinformation of 1 to make the determination of whether or not toallocate. For example, in one embodiment, so long as the cycle savingsis one or more cycles, an entry may be allocated. Furthermore, if avalid entry already exists in the entry identified for allocation, bycontrol circuitry 172 then, in one embodiment, DCL would compare thesavings of 1 cycle (which would be realized by storing I2 into BTB 144)to the cycle savings value stored in the CS field of the valid entryalready in the identified entry in order to determine whether I2 willreplace the existing entry. Alternatively, as was discussed above, otherfactors in addition to or in place of cycle saving information, such asthe prediction state of the identified valid entry to be replaced, maybe used to determine whether or not to allocate an entry for branchinstruction I2 in BTB 144.

Therefore, referring to FIG. 4, when, during clock cycle 5, DCL 164determines that I2 in slot 0 is a branch instruction, DCL 164 determinesthat comparator match signal 197 was previously asserted for the addressof I2 in clock cycle 2, indicating a hit in BTB 144, and BTBTA 178 wasselected by DCL 164 in clock cycle 3 to be driven on address bus 128. Inthis case, an entry for the branch instruction I2 is not allocated inBTB 144 since it is already present in BTB 144.

Referring back to FIG. 3, during the seventh cycle, instructions T0 andT1, corresponding to the branch target address, $30, requested duringthe fifth clock cycle, are returned on data bus 130. Since during thesixth clock cycle, instruction I2 was decoded as a taken branchinstruction, any instructions loaded into instruction register 116, slot0, slot 1, etc. after the branch instruction has been decoded but beforethe branch target instructions are loaded into instruction buffer 105are discarded and are not decoded by DCL 164 during the seventh cycle,resulting in a stall condition in processor 184 (as indicated by theasterix in the seventh cycle) and I3, I4, and I5 are flushed out.

During the eighth clock cycle, instructions T0 and T1, corresponding tothe branch target address in memory 166, are loaded into instructionregister 116 and slot 0 of prefetch buffer 102, respectively. Similarly,during the ninth clock cycle, instruction T1 from slot 0 is loaded intoIR 116, etc. and DCL 164 performs the operations described above for theprevious clock cycles.

Sequential instruction fetching of the target stream (not shown)continues in FIG. 3. For example, address $38 is driven onto address bus128 in cycle 6 and address $40 is driven onto address bus 128 in cycle7.

In one embodiment, as shown in FIGS. 1 and FIG. 4, BTB 144 has beenpreviously loaded with a valid entry corresponding to instruction I2.BTB 144 includes the branch instruction address $8 loaded in a BIA slot,the branch target address $30 loaded in a corresponding BTA slot, andthe associated valid bit is asserted. During the first clock cycle,instruction register 116 and slot 0 are reserved, and address $0 isdriven onto address bus 128 to request instructions I0 and I1 located ataddress $0 and address $4 in memory 166. Comparator 170 also receivesthe address $0 that was driven onto address bus 128 and determineswhether address $0 hits in BTB 144. In the current example, it isassumed that address $0 does not hit (i.e. misses) in BTB 144.

During the second clock cycle, slots 1 and 2 are reserved, and thesequential instruction address, $8, is driven onto address bus 128.Comparator 170 also receives $8 and determines whether address $8 hitsin BTB 144. In the current example, it is assumed that address $8 doeshit in BTB 144. Therefore, comparator 170 asserts BTB hit signals 196and comparator match signal 197. Comparator 170 then selects the branchtarget address, $30, from the entry which caused the BTB hit. Theselected branch target address is provided to MUX 126 using branchtarget buffer target address (BTBTA) 178.

During the third clock cycle, instructions I0 and I1, corresponding tothe instruction address driven onto address bus 128 during the firstclock cycle, are driven onto data bus 130 to be loaded into instructionregister 116 and slot 0 of prefetch buffer 102 during the fourth clockcycle. Also, in the third clock cycle, slots 3 and 4 of prefetch buffer102 are reserved, and the branch target address, $30, is driven ontoaddress bus 128. Comparator 170 also receives address $30 and determineswhether it hits in BTB 144. In this example, it is assumed that $30misses in BTB 144.

During the fourth clock cycle, instructions I0 and I1 are provided toinstruction register 116 and S0, respectively. Sequential instructionfetching of the target stream (not shown) continues from this point onin FIG. 4.

During the fifth clock cycle, instructions T0 and T1, corresponding tothe branch instruction address driven onto address bus 128 during thethird clock cycle, $30, are driven onto data bus 130 to be loaded intoslot 0 and slot 1 of prefetch buffer 102 during the sixth clock cycle.Instruction 11, that was in slot 0 during the fourth clock cycle, isloaded into IR 116. Instructions I2 and I3, that were on data bus 130during the fourth clock cycle, are loaded into slot 0 and slot 1,respectively. During the sixth clock cycle, the branch instruction I2 isloaded into IR 116 for execution. During the seventh clock cycle, thenew target stream has been established, and execution of TO beginswithout a processor 184 stall between executing I2 and T0.

In one embodiment of the present invention, the sequence of code andtiming diagram shown in FIG. 3 can be viewed as a portion of the firstiteration of a loop containing the segment of code listed in FIG. 3.FIG. 4 can be viewed as a portion of all subsequent iterations of thesame loop shown in FIG. 4, where the BTB entry for branch instructionaddress $8 and branch target address $30 were loaded during the firstiteration of the loop as shown in FIG. 4. Therefore, note that havingallocated an entry for branch instruction address $8 (for instructionI2), a processor cycle was saved. However, note that if the entry thatwas replaced by branch instruction address $8 provided a greater numberof cycle savings (such as 2 or more) by being in the BTB 144, then, inFIG. 3, the branch instruction address $8 may not have been allocatedinto BTB 144, even though it could save one cycle.

In one embodiment, as shown in FIG. 1 and FIG. 5, BTB miss may result intwo processor cycle stalls. In the example of FIG. 5, it is assumed thataddresses $8 and $30 are not stored within BTB 144, thus resulting inBTB misses. Note that the descriptions for clock cycles 1-7 are the sameas was described above in reference to FIG. 3 and therefore will not berepeated now.

Referring to cycle 8 in the example of FIG. 5, the target instructions(T0 and T1) of address $30 provided in cycle 5, are loaded intoinstruction register 116 and slot 0, respectively. Since address $30missed in BTB 144, it was not determined that $30 corresponded to abranch instruction until it was examined in slot 0 by DCL 164.Therefore, S0TA 124 (generated by branch address calculator 108) wasselected by DCL 164 using MUX 126 to drive the branch target address,$30, onto address bus 128. The target instruction, T0, stored at thebranch target address is then returned via data bus 168 during theseventh clock cycle, along with the next sequential instruction, T1.Therefore, they are not loaded into instruction register 116 and slot 0until cycle 8. Since instruction T0 is the target of a branchinstruction, it is loaded directly in instruction register 116 since achange of flow is occurring. In the current example of FIG. 5, though,note that instruction T0 is also a branch instruction (BR $50).

Since TO is loaded directly into instruction register 116, it is alreadyin instruction register 116 when it is known that it is a branchinstruction. That is, the branch lookahead scheme was unable todetermine it was a branch instruction any earlier because it never wentto slot 0. Therefore, branch address calculator 108 provides $50 as IRTA120 via MUX 126 to address bus 128 in cycle 8. The target instruction,C0, of instruction T0 (along with the subsequent sequential instructionC1) is not returned to the data bus until cycle 10, and thus C0 and C1are not loaded into instruction register 116 and slot 0 until cycle 11.This results in two processor stalls in which instruction register 116is waiting to receive the target instruction stream beginning with C0.

Note that if instruction T0 (BR $50) was stored in BTB 144, a BTB hitwould have occurred in cycle 5, rather than a BTB miss, and the targetaddress of T0, $50, would have been driven onto address bus 128 in cycle6, in which case C0 and C1 would have been returned on the data bus incycle 8 and loaded into instruction registers 116 and slot 0 in cycle 9.Had this occurred, note that no processor stalls would have occurred,since T0 is present in instruction register 116 in cycle 8 and C0 wouldhave shown up in instruction register 116 in cycle 9, rather than incycle 11 (as occurred in the example of FIG. 5). That is, having hadinstruction T0 stored in BTB 144 would have resulted in a cycle savingsof 2 data processor cycles, since no processor stall cycles would haveoccurred between T0 and C0. Therefore, in the example of FIG. 5, once T0is actually resolved to be a taken branch (which occurs in cycle 8), itis loaded into BTB 144 in the next cycle (cycle 9) with a value of 2stored in its corresponding CS field. In one embodiment, this allocationof T0 into BTB 144 may not have occurred, if, for example, the branchinstruction in the existing entry were to offer a greater cycle savingsthan 2. (Also, as described above in reference to FIG. 3, note that I2may also have been loaded into BTB 144 in cycle 7, after having beenresolved as actually taken in cycle 6, with a value of 1 stored in itscorresponding CS field. That is, as was described above in reference toFIGS. 3 and 4, having had I2 stored in BTB 144 would have resulted in asavings of 1 data processor cycle.)

Sequential instruction fetching of the target stream starting at $50then continues in FIG. 5 (where further details of this are not shown).For example, address $58 is driven onto address bus 128 in cycle 9 andaddress $60 is driven onto address bus 128 in cycle 10.

In one embodiment, as illustrated in FIG. 1, an additional postponementflag (PF) may be stored for each entry in BTB 144 which can further beused to make allocation decisions. The PF can be used to indicatewhether or not replacement of the BTB entry was postponed since the lasttime the branch was taken. In this case, the next time the BTB entry isa candidate for allocation on a BTB miss, replacement of this entry mayoccur anyway. That is, even though an allocation of a particular entrywould normally not occur, if the PF is set to 1, the entry may beallocated regardless. In one embodiment, the PF is a 1-bit flag that,when set, indicates that replacement of that entry was postponed sincethe last time the branch was taken. (Note that, in one embodiment, a PFmay be stored for only a subset of the entries in BTB 144 and not everyentry in BTB 144.)

For example, for a particular branch instruction which missed in BTB 144and was later resolved as taken, a decision is made whether or not toallocate an entry for the branch in BTB 144. In one embodiment, table201 of FIG. 6 is used to make the allocation decision, where both therelative cycle savings and the state of the predictor of the entry inBTB 144 that is to be replaced are used to determine if allocation (i.e.replacement) occurs.

In one embodiment using a PF, if the entry is determined to not bereplaced (i.e. the new branch instruction is not allocated into BTB144), then the PF of that entry of BTB 144 that was to be replaced isset to “1”. Therefore, the next time that entry is selected forallocation (assuming its PF is still set to “1”), then regardless ofwhat table 201 of FIG. 6 indicates, the allocation would occur and thatentry would be replaced. That is, PF can be used to selectively overridea replacement decision defined by the predetermined allocation criteria(such as by the allocation criteria provided by table 201).Alternatively, if that entry is selected for allocation (assuming its PFis still set to “1”), then an alternate decision table may be used, suchas allocation decision table 210 of FIG. 7. This alternate decisiontable may, for example, allow for allocation to occur under moreconditions. Also, in one embodiment, the PF of a particular entry iscleared if a BTB hit of that entry occurs, indicating that the branchcorresponding to the entry may be taken. In an alternate embodiment, thePF flag of a hitting entry is only cleared if the branch correspondingto that entry is actually taken. Note that some embodiments may utilizean allocation table, such as the allocation tables of FIG. 6 or FIG. 7,without utilizing a PF flag. In those embodiments, postponement mayalways occur for some combinations in the table.

FIG. 7 also illustrates an allocation decision table, which, like table201 of FIG. 6, uses both relative cycle savings as well as the predictorstate of the entry to be replaced to determine whether to allocate a newentry or not. However, note that in the case of FIG. 7, the entryidentified for allocation would almost always be replaced, unless itspredictor state indicates ST with no relative cycle savings or itspredictor state indicates ST with a cycle savings of 1. Therefore, table210 of FIG. 7 allows allocation to occur under more conditions thantable 201 of FIG. 6. Therefore, in one embodiment, when an entry isidentified for allocation (i.e. replacement) and its PF is set to 0,then the criteria of table 201 of FIG. 6 may be used to determinewhether the allocation occurs. However, if the PF of the entryidentified for allocation is set to 1, then the criteria of table 210 ofFIG. 7 may be used to determine whether allocation occurs.

In one embodiment, each of tables 201 and 210 provide BTB replacementdecisions based on particular criteria. For example, for a particularvalue of the relative cycle savings and a particular value of thepredictor state, a replacement decision of whether to replace anidentified entry in the BTB is provided by each of tables 201 and 210.Therefore, each replacement decision of table 201 or 210 has a valuewhich indicates whether, under a particular criterion (i.e. particularvalues of the factors being used), replacement or allocation is tooccur. Therefore, a first set of BTB replacement decision can be used ifthe PF has a first value (e.g. the BTB replacement decisions of table201) and a second set of BTB replacement decisions can be used if the PFhas a second value (e.g. the BTB replacement decisions of table 210).

In one embodiment, each of the tables of FIGS. 6 and 7 may be set updifferently, using different factors, as was discussed in more detailabove. For example, they may include a single row or a single column, inwhich only one factor is used to determine whether allocation occurs.Alternatively, the tables may be set up using different factors forreplacement. Also, they may each be programmable, where, for example,the replacement decisions of each table may be programmable. Forexample, the tables may be programmed by a user or based on softwareprofiling. Also, in one embodiment, an allocation decision table may beused to determine if allocation occurs when the PF of the entry to bereplaced is 0, while allocation may always be performed, regardless ofany criteria, if the PF of the entry to be replaced is 1. Therefore, asecond table, such as the table of FIG. 7, may not be needed. Also, notethat each of the tables of FIGS. 6 and 7 may be implemented in a varietyof ways within DCL 164 (e.g. as a look-up table, as combinational logic,as a state machine, etc.). Also, DCL 164 can provide information to CTRL172 (which updates allocated entries in BTB 144) with respect to settingor clearing the PF of an entry via replacement flag control signal 177.Therefore, note that a first set of criteria can be used to determinewhether replacement of an identified entry occurs when PF has a firstvalue and a second set of criteria can be used to determine whetherreplacement of the identified entry occurs when PF has a second value,different from the first value. Note that the second set of criteria canindicate to always replace.

In another embodiment, the PF for each entry may be implemented as acount value. For example, the PF may be a 2-bit count value such thatreplacement of an entry can be postponed more than once since the lasttime the branch of that entry was taken. In one embodiment, when anentry to be replaced has a PF count value of 0 (which is the initialvalue of the PF of an entry, or the value upon that entry resulting in ahit in BTB 144, or, in another embodiment, the value upon that entryresulting in a BTB hit for a taken branch), then table 201 of FIG. 6 maybe used to determine whether the entry which has been identified forallocation is to be replaced. In this example, when the entry to bereplaced has a PF count value of 1, then table 210 of FIG. 7 may be usedto determine whether the entry which has been identified for allocationis to be replaced. In this example, when the entry to be replaced has aPF count value of 2 or more, then the determination may be to replacethe entry regardless of any criteria. In this example, each time anallocation decision is made to not replace an identified entry, then DCL164, via replacement flag control 177, can increment the value of thecorresponding PF by one. Also, each time an entry results in a hit, or,alternatively, in a hit which results in a taken branch, then the PFcount value can be cleared back to 0. Therefore, in the case of using amulti-bit PF, a first set of criteria can be used to determined whetherreplacement of an identified entry occurs when PF has a first value orrange of values, a second set of criteria can be used to determinewhether replacement of the identified entry occurs when PF has a secondvalue or range of values, and a third set of criteria can be used todetermine whether replacement of the identified entry occurs when PF hasa third value or range of values. Note that each of the first, second,and third values or range of values may be mutually exclusive. Also,note that the third set of criteria can indicate to always replace(regardless of any criteria).

In one embodiment, the determination as to whether to set the PF (orincrement the PF) of a particular entry can be made based on thepredictor state of the entry. For example, in a BTB having an allocationpolicy in which all taken branches are allocated, the PF of an entry tobe replaced can be set to “0” if its predictor indicates ST. In thismanner, the allocation policy to always allocate on taken branches isselectively overridden by some BTB entries to allow the replacement ofthe identified entry for allocation to be postponed at least once.Alternatively, when a BTB hit occurs, and the branch is not taken, thePF flag of the existing entry may be set to “1” to preclude postponementin the future if the entry is selected for allocation prior to theentry's branch being taken again. Therefore, the PF can be used topostpone allocation in a variety of ways and for a variety of reasons.

By now it should be appreciated that there has been provided a methodfor improved allocation in which a decision on whether to allocate ornot can be made based on a variety of different factors. In oneembodiment, these factors include information on an entry which isidentified to be replaced by the allocation. For example, this factormay include the predictor state of the identified entry to be replaced.In one embodiment, the criteria for determining allocation may includecycle savings information (processor or clock cycle savings information)with respect to the new branch instruction to be stored in the entry forallocation or may include relative cycle savings information between thenew branch to be stored in the BTB and the branch to be replaced.Therefore, in one embodiment, cycle savings information is stored foreach branch that is stored into the BTB. Also, in one embodiment, apostponement flag can be used to postpone allocation so as not to removepossibly useful entries from the BTB. For example, based on a value ofthe postponement flag, a replacement decision made according topredetermined allocation criteria (based on one or more differentfactors, such as, for example, the allocation criteria provided in table201 or table 210) can be selectively overridden.

Because the apparatus implementing the present invention is, for themost part, composed of electronic components and circuits known to thoseskilled in the art, circuit details will not be explained in any greaterextent than that considered necessary as illustrated above, for theunderstanding and appreciation of the underlying concepts of the presentinvention and in order not to obfuscate or distract from the teachingsof the present invention.

Some of the above embodiments, as applicable, may be implemented using avariety of different information processing systems. For example,although FIG. 1 and the discussion thereof describe an exemplaryinformation processing architecture, this exemplary architecture ispresented merely to provide a useful reference in discussing variousaspects of the invention. Of course, the description of the architecturehas been simplified for purposes of discussion, and it is just one ofmany different types of appropriate architectures that may be used inaccordance with the invention. Those skilled in the art will recognizethat the boundaries between logic blocks are merely illustrative andthat alternative embodiments may merge logic blocks or circuit elementsor impose an alternate decomposition of functionality upon various logicblocks or circuit elements. Furthermore, note that FIG. 1 may illustrateonly a portion of processor 12, where processor 12 may include otherknown circuit elements, such as, for example, execution units, registerfiles, etc.

Thus, it is to be understood that the architectures depicted herein aremerely exemplary, and that in fact many other architectures can beimplemented which achieve the same functionality. In an abstract, butstill definite sense, any arrangement of components to achieve the samefunctionality is effectively “associated” such that the desiredfunctionality is achieved. Hence, any two components herein combined toachieve a particular functionality can be seen as “associated with” eachother such that the desired functionality is achieved, irrespective ofarchitectures or intermedial components. Likewise, any two components soassociated can also be viewed as being “operably connected,” or“operably coupled,” to each other to achieve the desired functionality.

Also for example, in one embodiment, the illustrated elements of system100 are circuitry located on a single integrated circuit or within asame device. Alternatively, system 10 may include any number of separateintegrated circuits or separate devices interconnected with each other.For example, memory 166 may be located on a same integrated circuit asprocessor 184 or on a separate integrated circuit or located withinanother peripheral or slave discretely separate from other elements ofsystem 10. Also for example, system 100 or portions thereof may be softor code representations of physical circuitry or of logicalrepresentations convertible into physical circuitry. As such, system 100may be embodied in a hardware description language of any appropriatetype.

Furthermore, those skilled in the art will recognize that boundariesbetween the functionality of the above described operations merelyillustrative. The functionality of multiple operations may be combinedinto a single operation, and/or the functionality of a single operationmay be distributed in additional operations. Moreover, alternativeembodiments may include multiple instances of a particular operation,and the order of operations may be altered in various other embodiments.

In one embodiment, system 100 is a computer system such as a personalcomputer system. Other embodiments may include different types ofcomputer systems. Computer systems are information handling systemswhich can be designed to give independent computing power to one or moreusers. Computer systems may be found in many forms including but notlimited to mainframes, minicomputers, servers, workstations, personalcomputers, notepads, personal digital assistants, electronic games,automotive and other embedded systems, cell phones and various otherwireless devices. A typical computer system includes at least oneprocessing unit, associated memory and a number of input/output (I/O)devices.

A computer system processes information according to a program andproduces resultant output information via I/O devices. A program is alist of instructions such as a particular application program and/or anoperating system. A computer program is typically stored internally oncomputer readable storage medium or transmitted to the computer systemvia a computer readable transmission medium. A computer processtypically includes an executing (running) program or portion of aprogram, current program values and state information, and the resourcesused by the operating system to manage the execution of the process. Aparent process may spawn other, child processes to help perform theoverall functionality of the parent process. Because the parent processspecifically spawns the child processes to perform a portion of theoverall functionality of the parent process, the functions performed bychild processes (and grandchild processes, etc.) may sometimes bedescribed as being performed by the parent process.

Although the invention is described herein with reference to specificembodiments, various modifications and changes can be made withoutdeparting from the scope of the present invention as set forth in theclaims below. For example, criteria other than those set forth above maybe used to determine whether or not to replace an existing entry uponallocation. Accordingly, the specification and figures are to beregarded in an illustrative rather than a restrictive sense, and allsuch modifications are intended to be included within the scope of thepresent invention. Any benefits, advantages, or solutions to problemsthat are described herein with regard to specific embodiments are notintended to be construed as a critical, required, or essential featureor element of any or all the claims.

The term “coupled,” as used herein, is not intended to be limited to adirect coupling or a mechanical coupling.

Furthermore, the terms “a” or “an,” as used herein, are defined as oneor more than one. Also, the use of introductory phrases such as “atleast one” and “one or more” in the claims should not be construed toimply that the introduction of another claim element by the indefinitearticles “a” or “an” limits any particular claim containing suchintroduced claim element to inventions containing only one such element,even when the same claim includes the introductory phrases “one or more”or “at least one” and indefinite articles such as “a” or “an.” The sameholds true for the use of definite articles.

Unless stated otherwise, terms such as “first” and “second” are used toarbitrarily distinguish between the elements such terms describe. Thus,these terms are not necessarily intended to indicate temporal or otherprioritization of such elements.

Additional Text:

-   1. A method for branch target buffer (BTB) allocation in a pipelined    data processing system, comprising:    -   receiving a branch instruction and determining a branch target        address corresponding to the branch instruction;    -   determining whether the branch target address is presently        stored in a branch target buffer (BTB), wherein when the branch        target address is not presently stored in the branch target        buffer, identifying an entry in the branch target buffer as an        identified entry to receive the branch target address; and    -   using a value in a field within the identified entry in the        branch target buffer to selectively override a replacement        decision defined by predetermined branch target buffer        allocation criteria.-   2. The method of statement 1 further comprising:    -   storing the branch target address within the identified entry in        response to the value in the field.-   3. The method of statement 1 further comprising:    -   using a first replacement value within a first set of        replacement decisions when the value assumes a first value to        thereby determine whether to selectively override the        replacement decision; and    -   using a second replacement value within a second set of        replacement decisions that is different from the first set of        replacement decisions when the value assumes a second value to        thereby determine whether to selectively override the        replacement decision.-   4. The method of statement 3 further comprising:    -   always replacing the identified entry with the branch target        address in response to determining that a branch is taken and        the value in the field within the identified entry has a        predetermined value.-   5. The method of statement 3 further comprising:    -   implementing the second set of replacement decisions to always        allocate the identified entry in the branch target buffer to        receive the branch target address.-   6. The method of statement 3 further comprising:    -   counting a number of times that postponement of the identified        entry in the branch target buffer occurs to form the value as a        count value; and    -   when the count value reaches a predetermined number, always        allocating the identified entry in the branch target buffer if a        branch is taken.-   7. The method of statement 3 further comprising:    -   making the first set of replacement decisions and the second set        of replacement decisions user programmable, the first set of        replacement decisions and the second set of replacement        decisions being represented in a table.-   8. The method of statement 1 further comprising:    -   when the branch target address is presently stored within the        branch target buffer, clearing the value in the field        corresponding to the entry storing the branch target address if        a branch is taken.-   9. A system comprising:    -   branch address circuitry for forming a branch target address        corresponding to an instruction;    -   a branch target buffer (BTB) having a plurality of entries, at        least one of the plurality of entries having a postponement flag        field comprising a value for indicating whether the at least one        of the plurality of entries has previously not been allocated        with a new branch target address; and    -   control logic circuitry coupled to the branch address circuitry        and the BTB for determining whether the branch target address is        stored in the BTB, the control logic circuitry identifying an        allocation entry in the BTB for allocation to receive the branch        target address in response to a miss in the BTB, the control        logic circuitry allocating the entry using a first branch target        buffer allocation criteria based on a first value of the        postponement flag field and using a second branch target buffer        allocation criteria based on a second value of the postponement        flag field.-   10. The system of statement 9 wherein the control logic circuitry    further comprises logic that allocates the entry by determining    whether to replace the identified entry using the first branch    target buffer allocation criteria based on the first value of the    postponement flag and using the second branch target buffer    allocation criteria based on the second value of the postponement    flag field.-   11. The system of statement 9 further comprising:    -   a first programmable table for storing the first branch target        buffer allocation criteria to be used in response to the first        value of the postponement flag; and    -   a second programmable table for storing the second branch target        buffer allocation criteria to be used in response to the second        value of the postponement flag.-   12. The system of statement 9 wherein the postponement flag field    further comprises:    -   at least two bits wherein a first field value represents the        value for indicating whether the at least one of the plurality        of entries has previously not been allocated with the new branch        target address, a second field value represents the value for        indicating use of the second branch target buffer allocation        criteria, and a third field value indicates that the entry must        be allocated.-   13. The system of statement 9 wherein the postponement flag field    further comprises:    -   at least two bits wherein a first range of values indicates that        the first branch target buffer allocation criteria is used to        determine allocation of the entry, a second range of values        indicates that the second branch target buffer allocation        criteria is used to determine allocation of the entry, and a        third range of values indicates that a third branch target        buffer allocation criteria is used to determine allocation of        the entry.-   14. A method comprising:    -   receiving a branch instruction at an input of a processor;    -   determining a branch target address corresponding to the branch        instruction;    -   determining whether the branch target address is presently        stored in a branch target buffer (BTB), wherein when the branch        target address is not presently stored in the branch target        buffer, identifying an entry in the branch target buffer as an        identified entry to receive the branch target address; and    -   using a postponement flag corresponding to the identified entry        to indicate whether or not replacement of the entry that was        identified was postponed since a branch associated with the        entry was last taken.-   15. The method of statement 14 further comprising:    -   using a first set of branch target buffer replacement decisions        when the postponement flag assumes a first value; and    -   using a second set of branch target buffer replacement decisions        that is different from the first set of branch target buffer        replacement decisions when the postponement flag assumes a        second value.-   16. The method of statement 15 further comprising:    -   implementing the second set of branch target buffer replacement        decisions to always allocate the identified entry in the branch        target buffer to receive the branch target address if a branch        is taken.-   17. The method of statement 15 further comprising:    -   making the first set of branch target buffer replacement        decisions and the second set of branch target buffer replacement        decisions programmable, the first set of branch target buffer        replacement decisions and the second set of branch target buffer        replacement decisions are represented in a table.-   18. The method of statement 14 further comprising:    -   always replacing the identified entry in the branch target        buffer with the branch target address in response to determining        that the value of the postponement flag has a predetermined        value.-   19. The method of statement 14 further comprising:    -   counting a number of times that postponement of the identified        entry occurs to form the postponement flag as a count value; and    -   when the count value reaches a predetermined number, always        allocating the identified entry if a branch is taken.-   20. The method of statement 14 further comprising:    -   implementing the postponement flag with at least two bits,        wherein a first range of values provided by the at least two        bits indicates that a first branch target buffer allocation        criteria is used to determine allocation of the identified        entry, a second range of values provided by the at least two        bits indicates that a second branch target buffer allocation        criteria is used to determine allocation of the identified        entry, and a third range of values provided by the at least two        bits indicates that a third branch target buffer allocation        criteria is used to determine allocation of the entry.

1. A method for branch target buffer (BTB) allocation in a pipelineddata processing system, comprising: receiving a branch instruction anddetermining a branch target address corresponding to the branchinstruction; determining whether the branch instruction hits or missesin a branch target buffer (BTB), wherein when the branch instructionresults in a miss in the branch target buffer, identifying an entry inthe branch target buffer as an identified entry to receive the branchtarget address; and using a value in a field within the identified entryin the branch target buffer to selectively override a replacementdecision defined by predetermined branch target buffer allocationcriteria, wherein the replacement decision indicates whether or not toallocate the identified entry to store information of the branchinstruction which resulted in the BTB miss, wherein the information ofthe branch instruction includes the branch target address.
 2. The methodof claim 1 further comprising: storing the information of the branchinstruction within the identified entry in response to the value in thefield.
 3. The method of claim 1 further comprising: using a firstreplacement value within a first set of replacement decisions when thevalue assumes a first value to thereby determine whether to selectivelyoverride the replacement decision; and using a second replacement valuewithin a second set of replacement decisions that is different from thefirst set of replacement decisions when the value assumes a second valueto thereby determine whether to selectively override the replacementdecision.
 4. The method of claim 3 further comprising: always storingthe information of the branch instruction into the identified entry inresponse to determining that a branch is taken and the value in thefield within the identified entry has a predetermined value.
 5. Themethod of claim 3 further comprising: implementing the second set ofreplacement decisions to always allocate the identified entry in thebranch target buffer to store the information of the branch instruction.6. The method of claim 3 further comprising: counting a number of timesthat postponement of the identified entry in the branch target bufferoccurs to form the value as a count value; and when the count valuereaches a predetermined number, always allocating the identified entryin the branch target buffer to store the information of the branchinstruction if a branch is taken.
 7. The method of claim 3 furthercomprising: making the first set of replacement decisions and the secondset of replacement decisions user programmable, the first set ofreplacement decisions and the second set of replacement decisions beingrepresented in a table.
 8. The method of claim 1 further comprising:when the branch target address is presently stored within the branchtarget buffer, clearing the value in the field corresponding to theentry storing the branch target address if a branch is taken.
 9. Asystem comprising: branch address circuitry for forming a branch targetaddress corresponding to an instruction; a branch target buffer (BTB)having a plurality of entries, at least one of the plurality of entrieshaving a postponement flag field comprising a value for indicatingwhether the at least one of the plurality of entries has previously notbeen allocated with a new branch target address; and control logiccircuitry coupled to the branch address circuitry and the BTB fordetermining whether the branch target address is stored in the BTB, thecontrol logic circuitry identifying an allocation entry in the BTB forallocation to receive the branch target address in response to a miss inthe BTB, the control logic circuitry allocating the entry using a firstbranch target buffer allocation criteria based on a first value of thepostponement flag field and using a second branch target bufferallocation criteria based on a second value of the postponement flagfield.
 10. The system of claim 9 wherein the control logic circuitryfurther comprises logic that allocates the entry by determining whetherto replace the identified entry using the first branch target bufferallocation criteria based on the first value of the postponement flagand using the second branch target buffer allocation criteria based onthe second value of the postponement flag field.
 11. The system of claim9 further comprising: a first programmable table for storing the firstbranch target buffer allocation criteria to be used in response to thefirst value of the postponement flag; and a second programmable tablefor storing the second branch target buffer allocation criteria to beused in response to the second value of the postponement flag.
 12. Thesystem of claim 9 wherein the postponement flag field further comprises:at least two bits wherein a first field value represents the value forindicating whether the at least one of the plurality of entries haspreviously not been allocated with the new branch target address, asecond field value represents the value for indicating use of the secondbranch target buffer allocation criteria, and a third field valueindicates that the entry must be allocated.
 13. The system of claim 9wherein the postponement flag field further comprises: at least two bitswherein a first range of values indicates that the first branch targetbuffer allocation criteria is used to determine allocation of the entry,a second range of values indicates that the second branch target bufferallocation criteria is used to determine allocation of the entry, and athird range of values indicates that a third branch target bufferallocation criteria is used to determine allocation of the entry.
 14. Amethod comprising: receiving a branch instruction at an input of aprocessor; determining a branch target address corresponding to thebranch instruction; determining whether the branch instruction hits ormisses in a branch target buffer (BTB), wherein when the branchinstruction results in a miss in the branch target buffer, identifyingan entry in the branch target buffer as an identified entry to receivethe branch target address; and using a postponement flag correspondingto the identified entry to indicate whether or not replacement of theentry that was identified was postponed since a branch associated withthe entry was last taken.
 15. The method of claim 14 further comprising:using a first set of branch target buffer replacement decisions when thepostponement flag assumes a first value; and using a second set ofbranch target buffer replacement decisions that is different from thefirst set of branch target buffer replacement decisions when thepostponement flag assumes a second value.
 16. The method of claim 15further comprising: implementing the second set of branch target bufferreplacement decisions to always allocate the identified entry in thebranch target buffer to receive the branch target address if a branch istaken.
 17. The method of claim 15 further comprising: making the firstset of branch target buffer replacement decisions and the second set ofbranch target buffer replacement decisions programmable, the first setof branch target buffer replacement decisions and the second set ofbranch target buffer replacement decisions are represented in a table.18. The method of claim 14 further comprising: always replacing theidentified entry in the branch target buffer with the branch targetaddress in response to determining that the value of the postponementflag has a predetermined value.
 19. The method of claim 14 furthercomprising: counting a number of times that postponement of theidentified entry occurs to form the postponement flag as a count value;and when the count value reaches a predetermined number, alwaysallocating the identified entry if a branch is taken.
 20. The method ofclaim 14 further comprising: implementing the postponement flag with atleast two bits, wherein a first range of values provided by the at leasttwo bits indicates that a first branch target buffer allocation criteriais used to determine allocation of the identified entry, a second rangeof values provided by the at least two bits indicates that a secondbranch target buffer allocation criteria is used to determine allocationof the identified entry, and a third range of values provided by the atleast two bits indicates that a third branch target buffer allocationcriteria is used to determine allocation of the entry.